In this notebook, a template is provided for you to implement your functionality in stages which is required to successfully complete this project. If additional code is required that cannot be included in the notebook, be sure that the Python code is successfully imported and included in your submission, if necessary. Sections that begin with 'Implementation' in the header indicate where you should begin your implementation for your project. Note that some sections of implementation are optional, and will be marked with 'Optional' in the header.
In addition to implementing code, there will be questions that you must answer which relate to the project and your implementation. Each section where you will answer a question is preceded by a 'Question' header. Carefully read each question and provide thorough answers in the following text boxes that begin with 'Answer:'. Your project submission will be evaluated based on your answers to each of the questions and the implementation you provide.
Note: Code and Markdown cells can be executed using the Shift + Enter keyboard shortcut. In addition, Markdown cells can be edited by typically double-clicking the cell to enter edit mode.
# Load pickled data
import pickle
# TODO: Fill this in based on where you saved the training and testing data
training_file = "train.p"
testing_file = "test.p"
sign_key_file = "human_readable_label.p"
with open(training_file, mode='rb') as f:
train = pickle.load(f)
with open(testing_file, mode='rb') as f:
test = pickle.load(f)
with open(sign_key_file, mode='rb') as f:
human_readable_label = pickle.load(f)
X_raw, y_raw = train['features'], train['labels']
X_test, y_test = test['features'], test['labels']
assert(len(X_raw) == len(y_raw))
assert(len(X_test) == len(y_test))
print("Import Complete")
print("{} samples imported".format(len(X_raw) + len(X_test)))
print("Test label dict: ",human_readable_label[1])
The pickled data is a dictionary with 4 key/value pairs:
'features' is a 4D array containing raw pixel data of the traffic sign images, (num examples, width, height, channels).'labels' is a 2D array containing the label/class id of the traffic sign. The file signnames.csv contains id -> name mappings for each id.'sizes' is a list containing tuples, (width, height) representing the the original width and height the image.'coords' is a list containing tuples, (x1, y1, x2, y2) representing coordinates of a bounding box around the sign in the image. THESE COORDINATES ASSUME THE ORIGINAL IMAGE. THE PICKLED DATA CONTAINS RESIZED VERSIONS (32 by 32) OF THESE IMAGESComplete the basic data summary below.
### Replace each question mark with the appropriate value.
# Number of training examples
n_train = len(X_raw)
# Number of testing examples.
n_test = len(X_test)
# What's the shape of an traffic sign image?
image_shape = X_raw[0].shape
# How many unique classes/labels there are in the dataset.
n_classes = len(set(y_raw))
print("Number of training examples =", n_train)
print("Number of testing examples =", n_test)
print("Image data shape =", image_shape)
print("Number of classes =", n_classes)
Visualize the German Traffic Signs Dataset using the pickled file(s). This is open ended, suggestions include: plotting traffic sign images, plotting the count of each sign, etc.
The Matplotlib examples and gallery pages are a great resource for doing visualizations in Python.
NOTE: It's recommended you start with something simple first. If you wish to do more, come back to it after you've completed the rest of the sections.
### Data exploration visualization goes here.
### Feel free to use as many code cells as needed.
import matplotlib.pyplot as plt
import random
import numpy as np
import math
# Visualizations will be shown in the notebook.
%matplotlib inline
def find_signs(sign_labels, sign_type):
""" Find all signs of given type and return their index
sign_labels -- The list of labels associated with the sign images
sign_type -- the type of sign to return. Numeric
"""
return [i for i, j in enumerate(sign_labels) if j == sign_type]
def show_signs(signs, labels, index, title, show_index = True, show_label = True):
""" Horizontal stack a list of signs together for review
signs -- list, images
labels -- list, sign labels
index -- list, signs to print
show_index -- bool, show the index of the first sign? default True
show_label -- bool, show the label of the first sign? default True
"""
img = signs[index[0]].squeeze()
for i in range(1, len(index)):
next_img = signs[index[i]]
img = np.hstack((img, next_img))
plt.figure(figsize=(12,1))
if show_label:
plt.xlabel(human_readable_label[labels[index[0]]])
if show_index:
plt.ylabel(index[0])
plt.title(title)
plt.tick_params(
axis='both',
which='both',
bottom='off',
top='off',
left='off',
right='off',
labelleft='off',
labelbottom='off')
plt.imshow(img)
plt.show()
# print(index)
for sign_type in range(n_classes):
index = random.sample(find_signs(y_raw, sign_type), 12)
show_signs(X_raw, y_raw, index, "", False)
# Taking a look at the number of each sign type in the training set
training_sign_count = []
test_sign_count = []
for i in range(n_classes):
training_sign_count.append(sum(y_raw == i))
test_sign_count.append(sum(y_test == i))
# Look at the distribution of training signs
fig = plt.figure()
fig.set_size_inches(12, 1)
ax = plt.subplot(111)
width=0.5
ax.bar(range(n_classes), training_sign_count, width = width)
ax.set_xticks(np.arange(n_classes) + width/2)
ax.set_xticklabels(human_readable_label.values(), rotation=90)
plt.title("Sign distribution in the training set")
plt.show()
# Look at the distribution of test signs
fig = plt.figure()
fig.set_size_inches(12, 1)
ax = plt.subplot(111)
width=0.5
ax.bar(range(n_classes), test_sign_count, width = width)
ax.set_xticks(np.arange(n_classes) + width/2)
ax.set_xticklabels(human_readable_label.values(), rotation=90)
plt.title("Sign distribution in the test set")
plt.show()
# Look at the percentage of training vs. test
percent_training = [training/(training + test) for training, test in zip(training_sign_count, test_sign_count)]
fig = plt.figure()
fig.set_size_inches(12, 1)
ax = plt.subplot(111)
width=0.5
ax.bar(range(n_classes), percent_training, width = width)
ax.set_xticks(np.arange(n_classes) + width/2)
ax.set_xticklabels(human_readable_label.values(), rotation=90)
plt.title("Compare the percent of training signs from total")
plt.show()
Design and implement a deep learning model that learns to recognize traffic signs. Train and test your model on the German Traffic Sign Dataset.
There are various aspects to consider when thinking about this problem:
Here is an example of a published baseline model on this problem. It's not required to be familiar with the approach used in the paper but, it's good practice to try to read papers like these.
NOTE: The LeNet-5 implementation shown in the classroom at the end of the CNN lesson is a solid starting point. You'll have to change the number of classes and possibly the preprocessing, but aside from that it's plug and play!
Use the code cell (or multiple code cells, if necessary) to implement the first step of your project. Once you have completed your implementation and are satisfied with the results, be sure to thoroughly answer the questions that follow.
### Preprocess the data here.
### Feel free to use as many code cells as needed.
from sklearn.utils import shuffle
import cv2
import numpy as np
### Equalize the brigtness
def equalize_image(image):
""" Histogram equalize the brightness
Convert to YUV
Histogram equalize the Y channel
Convert back to RGB
"""
# convert image to YUV
img_yuv = cv2.cvtColor(image, cv2.COLOR_BGR2YUV)
# equalize the histogram of the Y channel
img_yuv[:,:,0] = cv2.equalizeHist(img_yuv[:,:,0])
# equalize the histogram of the U channel
# img_yuv[:,:,1] = cv2.equalizeHist(img_yuv[:,:,1])
# equalize the histogram of the V channel
# img_yuv[:,:,2] = cv2.equalizeHist(img_yuv[:,:,2])
# convert the YUV image back to RGB format
return cv2.cvtColor(img_yuv, cv2.COLOR_YUV2BGR)
# return img_yuv
# Apply equalization
X_equalized = [equalize_image(image) for image in X_raw]
X_test_equalized = [equalize_image(image) for image in X_test]
print("Brightness equalization complete.")
### Examine the equalization
index = random.randint(0, len(X_raw))
# index = 32643 # Black
index = [index, index+len(X_raw)]
show_signs(list(X_raw) + list(X_equalized), y_raw, index, "Original compared to Equalized")
Describe how you preprocessed the data. Why did you choose that technique?
Answer:
The only preprocessing of the data I have done is to try and account for the large variation in image brightness and contrast. Some images are so dark that I could not see the sign. To accomplish this I converted the image to YUV colorspace then used OpenCV's Histogram Equalization on the Y channel, then reverted the image back to RGB. This did a great job of producing images that look like they were taken in a similar light.
The reason I did this equalization is that some of the images were so dark that I confused them with an empty image. While the network should be able to classify the images in any lighting condition if I can remove that learning from the network with a simple preprocess of the data it seems like an obvious win. I can easily apply this same equalization to any incoming image and it saves my network from having to discover the correlation between low and high contrast and brightness.
### Generate data additional data (OPTIONAL!)
### and split the data into training/validation/testing sets here.
### Feel free to use as many code cells as needed.
# Function to apply some random rotations, translations and skew.
def random_transform(image, transform_limit):
rows,cols,ch = image.shape
pts1 = np.float32([[0,0], [0,cols], [rows,0]])
# rand_pts = [random.randrange(-transform_limit, transform_limit) for i in range(6)]
rand_pts = [random.gauss(0.0, transform_limit) for i in range(6)]
pts2 = np.float32([
[rand_pts[0], rand_pts[1]],
[rand_pts[2], cols - rand_pts[3]],
[rows - rand_pts[4], rand_pts[5]]])
M = cv2.getAffineTransform(pts1,pts2)
return cv2.warpAffine(image, M, (cols,rows))
# examine the results of the random_transform function
index = random.randint(0, len(X_equalized))
img = X_equalized[index].squeeze()
test_transformation = img
for i in range(10):
test_transformation = np.hstack((test_transformation, random_transform(img, 6.0)))
plt.figure(figsize=(10,3))
plt.xlabel(human_readable_label[y_raw[index]])
plt.ylabel(index)
plt.title("Original image, left, compared to 10 random transformations")
plt.tick_params(
axis='both',
which='both',
bottom='off',
top='off',
left='off',
right='off',
labelleft='off',
labelbottom='off')
plt.imshow(test_transformation)
plt.show()
### Augment the training data using random transform.
# Number of transformed versions to make of the data set.
# Rather than augmenting the data to start I am augmenting
# each batch. This offers more flexibility in epoch count
# with reduced concern of overfitting. Hence, the augment_count = 0
augment_count = 0
X_augmented = list(X_equalized)
y_augmented = list(y_raw)
print("Original data size =", len(X_equalized), len(y_raw))
for i in range(augment_count):
# generate batch of new images
X_batch = [random_transform(img, 8) for img in X_equalized]
y_batch = list(y_raw)
X_augmented += X_batch
y_augmented += y_batch
print("Augmented data size =", len(X_augmented), len(y_augmented))
### examine the results of the random_transform function
index = [random.randint(0, len(X_equalized))]
index += [index[0] + (1+i)*len(X_equalized) for i in range(augment_count)]
show_signs(X_augmented, y_raw, index, "Original image compared to transformed versions")
### Shuffle the augmented data set and split into training and validation sets
from sklearn.model_selection import train_test_split
# shuffle data
X_shuffled, y_shuffled = shuffle(X_augmented, y_augmented)
# Ratio of the data to split for validation set
validation_ratio = 0.2
# Split the training set into a train and validation set.
X_train, X_validation, y_train, y_validation = train_test_split(
X_shuffled,
y_shuffled,
test_size = validation_ratio,
random_state=121
)
### Examine the set split. Check the lables match the image.
random_sign_type = random.randint(0, n_classes-1)
index = random.sample(find_signs(y_train, random_sign_type), 12)
show_signs(X_train, y_train, index, "Train Sample", False)
index = random.sample(find_signs(y_validation, random_sign_type), 12)
show_signs(X_validation, y_validation, index, "Validation Sample", False)
print("Training Set Size:", len(X_train))
print("Validation Set Size:", len(X_validation))
Describe how you set up the training, validation and testing data for your model. Optional: If you generated additional data, how did you generate the data? Why did you generate the data? What are the differences in the new dataset (with generated data) from the original dataset?
Answer:
I've randomized the data then split 20% of the data off for a validation set. Originally I augmented the data set with random translation, scale, rotation, and skew by utilizing the Affine Transform in OpenCV. Thinking about it further, I decided to transform the batches during training to try and reduce overfitting and improve generalization. The data augmentation method is still available above to augment the data prior to training if desired. However, the during train augmentation seems to work well and is one less parameter that needs to be tuned in the system of how much to augment the data. I would be interested in exploring the pro/con of this method. You can see the random transformation function being used on the training batch in cell 117.
I was originally tempted to try and augment the data to make the sign count for each sign type similar. I decided that there is information here that the model could use to its advantage to improve classification. If a sign is rare it makes sense that the model should be biased to select a similar looking sign that is more frequent.
### Helper to determine dimensions after convolution
def conv_size(input_size, filter_size, stride, padding):
""" Determine the size of the output after a convolution or max pool
input_size -- [width, height]
filter_size -- [width, height]
step_size -- filter shift
padding -- TODO: implement valid, same solution.
"""
input_width = input_size[0]
input_height = input_size[1]
filter_width = filter_size[0]
fitler_height = filter_size[1]
# output_width = (width - filter_width + 2*padding)/step_size + 1
# output_height = (height - fitler_height + 2*padding)/step_size + 1
if padding == 'SAME' or padding == 's' or padding == 'S':
output_height = math.ceil(float(input_height) / float(stride))
output_width = math.ceil(float(input_width) / float(stride))
else: # Valid Padding
output_height = math.ceil(float(input_height - fitler_height + 1) / float(stride))
output_width = math.ceil(float(input_width - filter_width + 1) / float(stride))
return [int(output_width), int(output_height)]
### Bottle neck residulal layer creater
def bottle_neck(kernel_size, input_filters, output_filters):
""" Create layer definition stack for use in conv_size
kernel_size -- int of kernal size and width
output_filters -- number of output filters of the layer set
"""
if input_filters == output_filters:
reduced_filters = input_filters//4
return [[[1, 1, 1, 'SAME'], reduced_filters],
[[kernel_size, kernel_size, 1, 'SAME'], reduced_filters],
[[1, 1, 1, 'SAME'], output_filters]]
else:
reduced_filters = output_filters//2
return [[[1, 1, 1, 'SAME'], reduced_filters],
[[kernel_size, kernel_size, 1, 'SAME'], reduced_filters],
[[1, 2, 2, 'SAME'], output_filters]]
def res_block(kernel_size, block_size, filters):
block = []
for i in range(block_size):
block += [[[kernel_size, kernel_size, 1, 'SAME'], filters],
[[kernel_size, kernel_size, 1, 'SAME'], filters]]
return block
def product(iterable):
p = 1
for i in iterable:
p *= i
return p
def network_builder(input_layer, network, verbose = True):
if verbose:
print("Input")
print(input_size)
print(product(input_size))
print()
layer = input_size
size = product(layer)
parameter_count = 0
layer_index = 0
for network_layer in network:
layer_index += 1
if len(network_layer) == 2:
kernel = network_layer[0]
depth = network_layer[1]
layer = conv_size(layer[0:2], kernel[0:2], kernel[2], kernel[3])
layer.append(depth)
parameter_count += product(kernel[0:2])*depth
if verbose:
print("Convolution", layer_index)
print(layer, "after",kernel[0:2],"convolution")
print("param:",parameter_count, "dim:",product(layer))
print()
elif len(network_layer) == 1:
layer = [1, 1]
depth = network_layer[0]
layer.append(depth)
parameter_count += product(layer)
if verbose:
print("Fully Connected", layer_index)
print(layer, product(layer))
else:
print("ERROR: Undefined Layer")
print("Total Layers:\t", layer_index)
print("Neurons:\t",parameter_count,"\n")
input_size = [32, 32, 3]
print("LeNet - Basic")
filters = [16, 32, 600, 120, 43]
network = [[[5, 5, 1, 'SAME'], filters[0]]]
network += [[[2, 2, 2, 'VALID'], filters[0]]]
network += [[[5, 5, 1, 'SAME'], filters[1]]]
network += [[[2, 2, 1, 'VALID'], filters[1]]]
network += [[filters[2]], [filters[3]], [filters[4]]]
# for layer in network:
# print(layer)
network_builder(input_size, network, verbose = False)
print("LeNet")
filters = [32, 64, 1024, 512, 43]
network = [[[5, 5, 1, 'SAME'], filters[0]]]
network += [[[2, 2, 2, 'VALID'], filters[0]]]
network += [[[5, 5, 1, 'SAME'], filters[1]]]
network += [[[2, 2, 1, 'VALID'], filters[1]]]
network += [[filters[2]], [filters[3]], [filters[4]]]
# for layer in network:
# print(layer)
network_builder(input_size, network, verbose = False)
print("ResNet n3")
n = [3, 3, 3]
filters = [16, 32, 64, 1000, 10]
network = [[[3, 3, 1, 'SAME'], filters[0]]]
network += res_block(3, n[0], filters[0])
network += res_block(3, n[1], filters[1])
network += res_block(3, n[2], filters[2])
network += [[filters[3]], [filters[4]]]
# for layer in network:
# print(layer)
network_builder(input_size, network, verbose = False)
print("ResNet v1")
print("Accuracy:\t 99.0%")
n = [3, 4, 5]
filters = [16, 32, 64, 1024, 43]
network = [[[3, 3, 1, 'SAME'], filters[0]]]
network += bottle_neck(3, filters[0], filters[0]) * (n[0]-1)
network += bottle_neck(3, filters[0], filters[1])
network += bottle_neck(3, filters[1], filters[1]) * (n[1]-1)
network += bottle_neck(3, filters[1], filters[2])
network += bottle_neck(3, filters[2], filters[2]) * n[2]
network += [[filters[3]], [filters[4]]]
# for layer in network:
# print(layer)
network_builder(input_size, network, verbose = False)
print("ResNet v2")
print("Accuracy:\t 99.35%")
n = [16, 12, 8]
filters = [16, 32, 64, 1024, 43]
# Convolution
network = [[[3, 3, 1, 'SAME'], filters[0]]]
# Bottleneck 1
network += bottle_neck(3, filters[0], filters[0]) * (n[0]-1)
network += bottle_neck(3, filters[0], filters[1])
# Bottleneck 2
network += bottle_neck(3, filters[1], filters[1]) * (n[1]-1)
network += bottle_neck(3, filters[1], filters[2])
# Bottleneck 3
network += bottle_neck(3, filters[2], filters[2]) * n[2]
# Fully Connected
network += [[filters[3]], [filters[4]]]
# for layer in network:
# print(layer)
network_builder(input_size, network, verbose = False)
### Helper function for constructing the network
from tensorflow.contrib.layers import flatten
import tensorflow as tf
# Weight variable constructor
def weight_variable(shape):
initial = tf.truncated_normal(shape, stddev=0.1)
return tf.Variable(initial)
# Bias variable constructor
def bias_variable(shape):
# apply small positive shift to try and avoid dead ReLUs
initial = tf.constant(0.1, shape=shape)
return tf.Variable(initial)
# Convolutional Layer with ReLU activation
def conv2d_layer(features, weights, bias):
x = tf.nn.conv2d(features, weights, strides=[1, 1, 1, 1], padding='SAME')
x = tf.nn.bias_add(x, bias)
return tf.nn.relu(x)
def color_map_layer(features, weights, bias):
x = tf.nn.conv2d(features, weights, strides=[1, 1, 1, 1], padding='VALID')
x = tf.nn.bias_add(x, bias)
return tf.nn.tanh(x)
# 2x2 Max pool
def max_pool_2x2(features):
return tf.nn.max_pool(features, ksize=[1, 2, 2, 1],
strides=[1, 2, 2, 1], padding='VALID')
# Fully connected layer with ReLU activation and dropout
def fully_connected(features, weights, bias, dropout):
x = tf.matmul(features, weights)
x = tf.add(x, bias)
x = tf.nn.relu(x)
return tf.nn.dropout(x, dropout)
def res_layer(features, kernel, layers, output_filters):
input_filters = tf.nn.shape(features)[-1]
x_a = features
for i in range(layers-1):
x_a = conv2d_layer(x_a,
weight_variable([kernel, kernel, input_filters, input_filters]),
bias_varialbe([input_filters]))
x_a = tf.nn.conv2d(x_a,
weight_variable([kernel, kernel, input_filters, output_filters]),
strides=[1, 2, 2, 1],
padding='SAME')
x_a = tf.nn.bias_add(x_a, bias)
x_b = tf.nn.conv2d(features,
weight_variable([1, 1, input_filters, output_filters]),
strides=[1, 2, 2, 1],
padding='SAME')
x = tf.nn.add(x_a, x_b)
return tf.nn.relu(x)
def res_bottleneck(features, layers, input_filters, output_filters):
a_filters = input_filters // 4
b_filters = output_filters // 2
x = features
### normal bottle neck layers
for i in range(layers):
x_a = x
x_b = x
x_a = conv2d_layer(x_a,
weight_variable([1, 1, input_filters, a_filters]),
bias_variable([a_filters]))
x_a = conv2d_layer(x_a,
weight_variable([3, 3, a_filters, a_filters]),
bias_variable([a_filters]))
x_a = tf.nn.conv2d(x_a,
weight_variable([1, 1, a_filters, input_filters]),
strides=[1, 1, 1, 1],
padding='SAME')
x_a = tf.nn.bias_add(x_a, bias_variable([input_filters]))
x = tf.add(x_a, x_b)
x = tf.nn.relu(x)
### dimension change layers
if input_filters != output_filters:
std_dim_reduction = True
x_a = x
x_b = x
if std_dim_reduction:
x_a = tf.nn.conv2d(x_a,
weight_variable([1, 1, input_filters, b_filters]),
strides=[1, 2, 2, 1],
padding='SAME')
else:
x_a = max_pool_2x2(x_a)
x_a = tf.nn.conv2d(x_a,
weight_variable([1, 1, input_filters, b_filters]),
strides=[1, 1, 1, 1],
padding='SAME')
x_a = conv2d_layer(x_a,
weight_variable([3, 3, b_filters, b_filters]),
bias_variable([b_filters]))
x_a = tf.nn.bias_add(x_a, bias_variable([b_filters]))
x_a = tf.nn.conv2d(x_a,
weight_variable([1, 1, b_filters, output_filters]),
strides=[1, 1, 1, 1],
padding='SAME')
x_a = tf.nn.bias_add(x_a, bias_variable([output_filters]))
# Selection of dim change method
if std_dim_reduction:
# Standard ResNet dim change method
# feature reduction and depth increase with convolution with stride>1
x_b = tf.nn.conv2d(x_b,
weight_variable([1, 1, input_filters, output_filters]),
strides=[1, 2, 2, 1],
padding='SAME')
else:
# Non-standard method for dim change
# feature reduction with max pool and depth increase with convolution with stride=1
x_b = max_pool_2x2(x_b)
x_b = tf.nn.conv2d(x_b,
weight_variable([1, 1, input_filters, output_filters]),
strides=[1, 1, 1, 1],
padding='SAME')
x_b = tf.nn.bias_add(x_b, bias_variable([output_filters]))
x = tf.add(x_a, x_b)
x = tf.nn.relu(x)
return x
### ResNet with bottleneck blocks
def resNet(image, dropout):
# resNet setup
filters = [16, 32, 64, 1024]
layers = [16, 12, 8]
# start the network flow with the input image
flow = image
# Inception Layer: Input = 32x32x3, Output = 32x32x16
# flow_1x1 = conv2d_layer(flow,
# weight_variable([1, 1, 3, inception_filters[0]]),
# bias_variable([inception_filters[0]]))
# flow_3x3 = conv2d_layer(flow,
# weight_variable([3, 3, 3, inception_filters[1]]),
# bias_variable([inception_filters[1]]))
# flow_5x5 = conv2d_layer(flow,
# weight_variable([5, 5, 3, inception_filters[2]]),
# bias_variable([inception_filters[2]]))
# flow = tf.concat(2, [flow_1x1, flow_3x3, flow_5x5])
# Convolution 1: Input = 32x32x3, Output = 32x32x16
flow = conv2d_layer(flow,
weight_variable([3, 3, 3, filters[0]]),
bias_variable([filters[0]]))
# Residual 1: Input = 32x32x16, Output = 16x16x32
flow = res_bottleneck(flow, layers[0]-1, filters[0], filters[1])
# Residual 2: Input = 16x16x16, Output = 8x8x64
flow = res_bottleneck(flow, layers[1]-1, filters[1], filters[2])
# Residual 3: Input = 8x8x64, Output = 8x8x64
flow = res_bottleneck(flow, layers[2], filters[2], filters[2])
# Flatten. Input = 8x8x64 Output = 4096
flow = tf.contrib.layers.flatten(flow)
# Fully Connected 1. Input = 4096, Output = 1024
flow = fully_connected(flow,
weight_variable([8*8*filters[2], filters[3]]),
bias_variable([filters[3]]), dropout)
# Fully Connected 2. Input = 1024, Output = 43
flow = tf.add(tf.matmul(flow, weight_variable([filters[3], n_classes])), bias_variable([n_classes]))
return flow
### LeNet 2 conv layers with max pool and 3 fully connected layers
def LeNet(image, dropout):
weights = {
'color_map': weight_variable([1, 1, 3, 3]),
'conv_1': weight_variable([3, 3, 3, 64]),
'conv_2': weight_variable([3, 3, 64, 64]),
'fc_1': weight_variable([4096, 1024]),
'fc_2': weight_variable([1024, 512]),
'fc_3': weight_variable([512, n_classes])
}
biases = {
'color_map': bias_variable([3]),
'conv_1': bias_variable([64]),
'conv_2': bias_variable([64]),
'fc_1': bias_variable([1024]),
'fc_2': bias_variable([512]),
'fc_3': bias_variable([n_classes])
}
# start the network flow with the input image
flow = image
# 1, Color Map: Convolutional. Filter = 1x1x3. Input = 32x32x3. Output = 32x32x3.
flow = color_map_layer(flow, weights['color_map'], biases['color_map'])
# 2, Conv 1: Convolutional. Filter = 7x7x10. Input = 32x32x3. Output = 32x32x10.
flow = conv2d_layer(flow, weights['conv_1'], biases['conv_1'])
# Pooling. Input = 32x32x10. Output = 16x16x10.
flow = max_pool_2x2(flow)
# 3, Conv 2: Convolutional. Filter = 5x5x32. Input = 16x16x10. Output = 16x16x32.
flow = conv2d_layer(flow, weights['conv_2'], biases['conv_2'])
# Pooling. Input = 16x16x32. Output = 8x8x32.
flow = max_pool_2x2(flow)
# Flatten. Input = 4x4x97. Output = 1552.
flow = tf.contrib.layers.flatten(flow)
# 5, FC 1 3: Fully Connected. Input = 1552. Output = 1056.
flow = fully_connected(flow, weights['fc_1'], biases['fc_1'], dropout)
# 6, Layer 4: Fully Connected. Input = 1056. Output = 552.
flow = fully_connected(flow, weights['fc_2'], biases['fc_2'], dropout)
# 7, Layer 5: Fully Connected. Input = 552. Output = n_classes = 43.
return tf.add(tf.matmul(flow, weights['fc_3']), biases['fc_3'])
What does your final architecture look like? (Type of model, layers, sizes, connectivity, etc.) For reference on how to build a deep neural network using TensorFlow, see Deep Neural Network in TensorFlow from the classroom.
Answer:
I first implemented LeNet and played with various layer configurations. The basic LeNet achieved reasonable results for me with networks consistently achieving >98.5% accuracy.
However, didactically I wanted to experiment with a different architecture and explore a much deeper network. I settled on using a variation of a ResNet using bottleneck layers to try and reduce parameter count. This network has 110 hidden layers with 5747 neurons. It was interesting to try to push the parameter count down while pushing layer count up. Compared to many of the benchmark networks for this problem this network has orders of magnitude fewer neurons and achieves similar accuracy.
ReLU activations were used exclusively throughout the network.
### Train your model here.
### Feel free to use as many code cells as needed.
x = tf.placeholder(tf.float32, (None, 32, 32, 3))
y = tf.placeholder(tf.int32, (None))
one_hot_y = tf.one_hot(y, n_classes)
keep_prob = tf.placeholder(tf.float32)
RATE = 0.001
NO_IMPROVEMENT_STOP = 12 # Stop after no improvement over N Epochs
BATCH_SIZE = 128
TRAINING_DROPOUT = 0.8
IMAGE_JITTER = 6.0
# logits = LeNet(x, keep_prob)
logits = resNet(x, keep_prob)
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits, one_hot_y)
loss_operation = tf.reduce_mean(cross_entropy)
optimizer = tf.train.AdamOptimizer(learning_rate = RATE)
training_operation = optimizer.minimize(loss_operation)
correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(one_hot_y, 1))
accuracy_operation = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
saver = tf.train.Saver()
def evaluate(X_data, y_data):
num_examples = len(X_data)
total_accuracy = 0
sess = tf.get_default_session()
for offset in range(0, num_examples, BATCH_SIZE):
batch_x, batch_y = X_data[offset: offset + BATCH_SIZE], y_data[offset: offset + BATCH_SIZE]
accuracy = sess.run(accuracy_operation, feed_dict={x: batch_x, y: batch_y, keep_prob: 1.0})
total_accuracy += (accuracy * len(batch_x))
return total_accuracy / num_examples
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
num_examples = len(X_train)
accuracy_history = []
print("Training...")
print()
keep_learning = True
while keep_learning:
X_train, y_train = shuffle(X_train, y_train)
for offset in range(0, num_examples, BATCH_SIZE):
end = offset + BATCH_SIZE
batch_x, batch_y = X_train[offset:end], y_train[offset:end]
# apply random transformation to image batch
batch_x = [random_transform(img, IMAGE_JITTER) for img in batch_x]
sess.run(training_operation, feed_dict={x: batch_x, y: batch_y, keep_prob: TRAINING_DROPOUT})
validation_accuracy = evaluate(X_validation, y_validation)
accuracy_history.append(validation_accuracy)
# check for accuracy improvement
if len(accuracy_history) > NO_IMPROVEMENT_STOP:
keep_learning = max(accuracy_history[-NO_IMPROVEMENT_STOP:]) > max(accuracy_history[:-NO_IMPROVEMENT_STOP])
print("EPOCH {} ...".format(len(accuracy_history)))
print("Validation Accuracy = {:.3f}".format(validation_accuracy))
print()
# saver.save(sess, "tmp/model_resNet")
saver.save(sess, "tmp/model_resNet2")
print("Model saved")
print("Previous Best = {:.3f}".format(accuracy_history[-1]))
dvde = [accuracy_history[i+1] - accuracy_history[i] for i in range(len(accuracy_history) - 1)]
plt.plot(accuracy_history)
plt.xlabel("Epochs")
plt.ylabel("Accuracy on Validation Set")
plt.title("Rate: %0.4f Epochs: %i Batch: %i Dropout: %0.2f" % (RATE, len(accuracy_history), BATCH_SIZE, TRAINING_DROPOUT))
plt.show()
plt.plot(dvde)
plt.xlabel("Epochs")
plt.ylabel("Rate of change of accuracy")
plt.title("Rate: %0.4f Epochs: %i Batch: %i Dropout: %0.2f" % (RATE, len(accuracy_history), BATCH_SIZE, TRAINING_DROPOUT))
plt.show()
### Take a look at the accuracy of the network on the original data
with tf.Session() as sess:
saver.restore(sess, "tmp/model_resNet2")
test_accuracy = evaluate(X_equalized, y_raw)
print("Accuracy on equalized train images = {:.3f}".format(test_accuracy))
How did you train your model? (Type of optimizer, batch size, epochs, hyperparameters, etc.)
Answer:
Rather than spend time tweaking parameters I spent most of my time playing with the network architecture leaving the hyperparameters relatively constant. I used the Adam optimizer throughout to allow more focus on architecture rather than learning rate schedules. For most of my development I used the following:
I then changed from a defined epoch count to terminating after 12 epochs without improvement. Again, this was in an effort to eliminate training parameters and allow focus on learning different networks.
What approach did you take in coming up with a solution to this problem? It may have been a process of trial and error, in which case, outline the steps you took to get to the final solution and why you chose those steps. Perhaps your solution involved an already well known implementation or architecture. In this case, discuss why you think this is suitable for the current problem.
Answer: I started with LeNet since I had some familiarity with the network and I didn't have to reinvent the wheel. The convolution layout of LeNet is an obvious choice for this project. From there I added dropout to the fully connected layers and played around with adding additional convolution layers, max pool, and fully connected layers.
I wanted to try implementing a different architecture as a didactic exercise. Since I am fairly limited in computational resources I wanted to try and minimize parameter count as much as possible. I really liked the intuitive nature of ResNet and thought that I could utilize the bottleneck architecture to drastically reduce parameter count while still maintaining a deep network.
During the dimensional reduction of the feature maps, I simply utilized a 1x1 convolution with a stride of 2. I find this method very interesting in that we are disregarding 75% of the data with this reduction method. A max or average pool seems like it would be an obvious choice for dimensionality reduction without additional parameters that wouldn't be quite so naive. I'm puzzled that this was not employed in the ResNet paper, https://arxiv.org/pdf/1512.03385v1.pdf. Implementing the a max pool for dimensionallity reduction drasitcally increased training time with minimal if any increase in performance.
To see impact of layer count I pushed the layers beyond 100 and maintained a neuron count of around 5000. Which compared to some of the more simple LeNet archetectures that I was using to start with is drastically smaller on a parameter count perspective.
Take several pictures of traffic signs that you find on the web or around you (at least five), and run them through your classifier on your computer to produce example results. The classifier might not recognize some local signs but it could prove interesting nonetheless.
You may find signnames.csv useful as it contains mappings from the class id (integer) to the actual sign name.
Use the code cell (or multiple code cells, if necessary) to implement the first step of your project. Once you have completed your implementation and are satisfied with the results, be sure to thoroughly answer the questions that follow.
### Load the images and plot them here.
### Feel free to use as many code cells as needed.
### Import new signs
import glob
file = "test_images/*.jpg"
# file = "test_images/full_set/*.jpg"
new_signs = []
for img in glob.glob(file):
temp_img = cv2.imread(img)
temp_img = cv2.resize(temp_img, (32, 32))
temp_img = cv2.cvtColor(temp_img, cv2.COLOR_RGB2BGR)
new_signs.append(temp_img)
new_signs_labels = [0 for i in range(len(new_signs))]
show_signs(new_signs, new_signs_labels, range(len(new_signs)), "Hey, some new signs!", False, False)
# print(new_signs_labels)
### Equalize new images
new_signs_eq = [equalize_image(image) for image in new_signs]
### Compare and assign label
# new_signs_labels = [2, 17, 38, 2, 27, 33, 17, 35, 14, 15, 17, 5, 13, 35, 13, 29, 29, 33]
new_signs_labels = [38, 17, 14, 5, 33]
for i in range(len(new_signs)):
show_signs(list(new_signs)+list(new_signs_eq), new_signs_labels, [i, i+len(new_signs)], "")
Choose five candidate images of traffic signs and provide them in the report. Are there any particular qualities of the image(s) that might make classification difficult? It could be helpful to plot the images in the notebook.
Answer:
Each of the signs have some aspect that might make them difficult to classify. The image quality of the new signs is much higher in most cases when compared to the training images. Colors have much higher saturation and contrast is much higher. With the training method of jittering each batch of training images I wouldn't expect the angle, size, skew or position to have much impact on the classification accuracy.
Some aspects of each sign that may hinder classification accuracy:
Is your model able to perform equally well on captured pictures when compared to testing on the dataset? The simplest way to do this check the accuracy of the predictions. For example, if the model predicted 1 out of 5 signs correctly, it's 20% accurate.
NOTE: You could check the accuracy manually by using signnames.csv (same directory). This file has a mapping from the class id (0-42) to the corresponding sign name. So, you could take the class id the model outputs, lookup the name in signnames.csv and see if it matches the sign from the image.
Answer:
In the small sample set of five signs the network was able to correctly classify four resulting in an accuracy of 80%. This is a far cry from the 99% accuracy that the model as achieving on the training and validation set. However, this is within a 80% confidience interval that the actual accuracy at least 99%.
def top_k(input, rank):
input = tf.nn.softmax(input)
return tf.nn.top_k(input, rank)
### Run the model on the new data
with tf.Session() as sess:
saver.restore(sess, "tmp/model_resNet2")
test_accuracy = evaluate(new_signs_eq, new_signs_labels)
print("Accuracy on new images = {:.3f}".format(test_accuracy))
top_5_new = sess.run(top_k(logits, 5), feed_dict={x: new_signs_eq, y: new_signs_labels, keep_prob: 1.0})
# Confidence in model Accuracy
from math import sqrt
import random
def mean(l):
return float(sum(l))/len(l)
def var(l):
m = mean(l)
return sum([(x-m)**2 for x in l])/len(l)
def factor(l):
return 1.53 # 80%, two-tail, for a degree of freedom of 4
def conf(l):
return factor(l) * sqrt(var(l) / len(l))
def test(l, h):
interval = conf(l)
mu = mean(l)
lower = mu - interval
upper = mu + interval
return h > lower and h < upper
l = [0, 1, 1, 1, 1]
# Alternate hypothesis
# The actual model accuracy is at least 99% acurate
H_a = 0.99
print("80%% confidence interval for classification accuracy is %2.0f%% to %3.0f%%." %
(((mean(l)-conf(l))*100), (mean(l)+conf(l))*100))
print("Therefore, the hypothesis test, with 80%% confidence, that the actual accuracy is %2.0f%% is %s." %
(H_a*100, test(l, H_a)))
Use the model's softmax probabilities to visualize the certainty of its predictions, tf.nn.top_k could prove helpful here. Which predictions is the model certain of? Uncertain? If the model was incorrect in its initial prediction, does the correct prediction appear in the top k? (k should be 5 at most)
tf.nn.top_k will return the values and indices (class ids) of the top k predictions. So if k=3, for each sign, it'll return the 3 largest probabilities (out of a possible 43) and the correspoding class ids.
Take this numpy array as an example:
# (5, 6) array
a = np.array([[ 0.24879643, 0.07032244, 0.12641572, 0.34763842, 0.07893497,
0.12789202],
[ 0.28086119, 0.27569815, 0.08594638, 0.0178669 , 0.18063401,
0.15899337],
[ 0.26076848, 0.23664738, 0.08020603, 0.07001922, 0.1134371 ,
0.23892179],
[ 0.11943333, 0.29198961, 0.02605103, 0.26234032, 0.1351348 ,
0.16505091],
[ 0.09561176, 0.34396535, 0.0643941 , 0.16240774, 0.24206137,
0.09155967]])
Running it through sess.run(tf.nn.top_k(tf.constant(a), k=3)) produces:
TopKV2(values=array([[ 0.34763842, 0.24879643, 0.12789202],
[ 0.28086119, 0.27569815, 0.18063401],
[ 0.26076848, 0.23892179, 0.23664738],
[ 0.29198961, 0.26234032, 0.16505091],
[ 0.34396535, 0.24206137, 0.16240774]]), indices=array([[3, 0, 5],
[0, 1, 4],
[0, 5, 1],
[1, 3, 5],
[1, 4, 3]], dtype=int32))
Looking just at the first row we get [ 0.34763842, 0.24879643, 0.12789202], you can confirm these are the 3 largest probabilities in a. You'll also notice [3, 0, 5] are the corresponding indices.
Answer:
The model was very confident in its classification for all of the signs except for the first "Keep Right" sign. This sign does have the correct classification in the top 5. I would have to guess that the high skew of the sign is leading to the poor classification however, this is merely conjecture.
I didn't anticipate that it would do such a good job on my terrible, hand-drawn stop sign. This does show a good degree of generalization.
def examine_classification(images, labels, top_k_output, index = None):
""" Plot image and top_k classifcation probabilites
images -- classified images
top_k_output -- output from tf.nn.top_k containing labels and probabilites
index -- index of specific items to show. default is to plot up to the first 20
"""
if not index:
if len(images) > 20:
index = range(20)
else:
index = range(len(images))
top_k_prob = top_k_output[0]
top_k_label = top_k_output[1]
for i in index:
y_pos = np.arange(len(top_k_label[i]))
probability = top_k_prob[i]
label = [human_readable_label[k] for k in top_k_label[i]]
fig = plt.figure(figsize=(4, 1.5))
# Show image
plt.subplot(1, 2, 1)
plt.imshow(images[i])
plt.xlabel(i)
plt.title(human_readable_label[labels[i]])
plt.tick_params(
axis='both',
which='both',
bottom='off',
top='off',
left='off',
right='off',
labelleft='off',
labelbottom='off'
)
# plt.axis('off')
# Show probability distribution
plt.subplot(1, 2, 2)
plt.barh(y_pos, probability, align='center', alpha=0.4)
plt.yticks(y_pos, label)
plt.xlim([0,1])
# plt.xlabel('Probability')
plt.title('Model Prediction')
plt.tick_params(
axis='both',
which='both',
bottom='off',
top='off',
left='off',
right='on',
labelleft='off',
labelbottom='on',
labelright='on'
)
plt.show()
def find_wrong_top_1(label, top_k):
""" Find the index of the signs that were classified incorrectly"""
correct = top_5_val[1][:,0] == y_validation
return [i for i, j in enumerate(correct) if j == False]
def find_wrong_top_k(label, top_k):
""" Find the index of the signs that were not classified in the top k"""
correct = [l in t for l, t in zip(label, top_k[1])]
return [i for i, j in enumerate(correct) if j == False]
### Visualize the classification accuracy on the new signs
examine_classification(new_signs, new_signs_labels, top_5_new, index = None)
with tf.Session() as sess:
saver.restore(sess, "tmp/model_resNet2")
top_5_test = sess.run(top_k(logits, 5), feed_dict={x: X_test_equalized, y: y_test, keep_prob: 1.0})
print("Test set run complete.")
### Find misclassified signs in the test set
test_wrong_index = find_wrong_top_k(y_test, top_5_test)
test_set_top1_error = len(test_wrong_index)/len(y_test)
test_set_accuracy = 1 - test_set_top1_error
print("This model achieved an accuracy of %3.2f%% on the test set" % (test_set_accuracy*100.))
print("With %i misclassified signs out of %i total signs in test set" % (len(test_wrong_index), len(y_test)))
### Examine classification errors of test set
examine_classification(X_test_equalized, y_test, top_5_test, test_wrong_index)
Note: Once you have completed all of the code implementations and successfully answered each question above, you may finalize your work by exporting the iPython Notebook as an HTML document. You can do this by using the menu above and navigating to \n", "File -> Download as -> HTML (.html). Include the finished document along with this notebook as your submission.
ResNet https://arxiv.org/pdf/1512.03385v1.pdf
Inception v4 https://arxiv.org/pdf/1602.07261v2.pdf